Dis-S2V: Discourse Informed Sen2Vec

نویسندگان

  • Tanay Kumar Saha
  • Shafiq R. Joty
  • Naeemul Hassan
  • Mohammad Al Hasan
چکیده

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has been shown to outperform the traditional bag-of-words representation. However, most of these learning methods consider only the content of a sentence and disregard the relations among sentences in a discourse by and large. In this paper, we propose a series of novel models for learning latent representations of sentences (Sen2Vec) that consider the content of a sentence as well as inter-sentence relations. We first represent the inter-sentence relations with a language network and then use the network to induce contextual information into the content-based Sen2Vec models. Two different approaches are introduced to exploit the information in the network. Our first approach retrofits (already trained) Sen2Vec vectors with respect to the network in two different ways: (i) using the adjacency relations of a node, and (ii) using a stochastic sampling method which is more flexible in sampling neighbors of a node. The second approach uses a regularizer to encode the information in the network into the existing Sen2Vec model. Experimental results show that our proposed models outperform existing methods in three fundamental information system tasks demonstrating the effectiveness of our approach. The models leverage the computational power of multi-core CPUs to achieve fine-grained computational efficiency. We make our code publicly available upon acceptance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an...

متن کامل

Benchmarking Still-to-Video Face Recognition via Partial and Local Linear Discriminant Analysis on COX-S2V Dataset

In this paper, we explore the real-world Still-to-Video (S2V) face recognition scenario, where only very few (single, in many cases) still images per person are enrolled into the gallery while it is usually possible to capture one or multiple video clips as probe. Typical application of S2V is mug-shot based watch list screening. Generally, in this scenario, the still image(s) were collected un...

متن کامل

Serro 2 Virus Highlights the Fundamental Genomic and Biological Features of a Natural Vaccinia Virus Infecting Humans

Vaccinia virus (VACV) has been implicated in infections of dairy cattle and humans, and outbreaks have substantially impacted local economies and public health in Brazil. During a 2005 outbreak, a VACV strain designated Serro 2 virus (S2V) was collected from a 30-year old male milker. Our aim was to phenotypically and genetically characterize this VACV Brazilian isolate. S2V produced small roun...

متن کامل

An Analysis of Iranian EFL Learners’ Dis-preferred Responses in Interactional Discourse

The present study, on the one hand, attempted to investigate the strategies applied in dispreferred responses by Iranian university students of English and the extent to which pragmatic transfer could occur.  On the other hand, the study aimed to probe into the association between dispreferred organization and turn-shape. To this end, 31 relevant naturally occurring conversations, totaling 120 ...

متن کامل

Evaluating Hierarchical Discourse Segmentation

Hierarchical discourse segmentation is a useful technology, but it is difficult to evaluate. I propose an error measure based on the word error rate of Beeferman et al. (1999). I then show that this new measure not only reliably distinguishes baseline segmentations from lexically-informed hierarchical segmentations and more informed segmentations from less informed segmentations, but it also of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.08078  شماره 

صفحات  -

تاریخ انتشار 2016